-
Notifications
You must be signed in to change notification settings - Fork 5.1k
[iOS][Globalization] Simple IndexOf
support with CompareOptions.IgnoreSymbols
#118523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @dotnet/area-system-globalization |
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements CompareOptions.IgnoreSymbols
support for iOS's hybrid globalization by adding symbol removal functionality using regex patterns. The implementation preprocesses strings to remove symbols, performs comparisons on cleaned strings, and maps results back to original string coordinates for IndexOf operations.
Key Changes:
- Added regex-based symbol removal functions with position mapping for IndexOf operations
- Extended all string comparison functions (Compare, IndexOf, StartsWith, EndsWith) to support IgnoreSymbols
- Enabled previously disabled tests by removing iOS platform exclusions for IgnoreSymbols functionality
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
File | Description |
---|---|
src/native/libs/System.Globalization.Native/pal_collation.m |
Core implementation adding symbol removal functions and IgnoreSymbols support to comparison operations |
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.iOS.cs |
Updated supported comparison options to include IgnoreSymbols |
src/libraries/System.Runtime/tests/System.Globalization.Tests/CompareInfo/CompareInfoTests.Compare.cs |
Removed iOS platform exclusions for IgnoreSymbols tests |
src/libraries/System.Runtime/tests/System.Globalization.Tests/CompareInfo/CompareInfoTests.IndexOf.cs |
Removed iOS platform exclusions for IgnoreSymbols tests |
src/libraries/System.Runtime/tests/System.Globalization.Tests/CompareInfo/CompareInfoTests.IsPrefix.cs |
Updated test expectations and removed some iOS platform exclusions |
src/libraries/System.Runtime/tests/System.Globalization.Tests/CompareInfo/CompareInfoTests.IsSuffix.cs |
Removed iOS platform exclusions for IgnoreSymbols tests |
src/libraries/System.Runtime/tests/System.Globalization.Tests/CompareInfo/CompareInfoTests.LastIndexOf.cs |
Removed iOS platform exclusions for IgnoreSymbols tests |
if (mapping == nil || mapping.count == 0 || modifiedRange.location >= mapping.count) | ||
return invalidRange; | ||
|
||
int32_t mappedLocation = [mapping[(NSUInteger)modifiedRange.location] intValue]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation assumes a contiguous range in the original string, but when symbols are removed, the characters may not be adjacent in the original string. The mapped length should account for all characters between the start and end positions, including any removed symbols in between.
Copilot uses AI. Check for mistakes.
int32_t mappedLocation = [mapping[(NSUInteger)modifiedRange.location] intValue]; | ||
|
||
// Calculate the mapped length by finding the end position in the original string | ||
NSUInteger endIndex = modifiedRange.location + modifiedRange.length - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation assumes a contiguous range in the original string, but when symbols are removed, the characters may not be adjacent in the original string. The mapped length should account for all characters between the start and end positions, including any removed symbols in between.
Copilot uses AI. Check for mistakes.
if (endIndex >= mapping.count) | ||
return invalidRange; | ||
|
||
int32_t mappedEndLocation = [mapping[endIndex] intValue]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation assumes a contiguous range in the original string, but when symbols are removed, the characters may not be adjacent in the original string. The mapped length should account for all characters between the start and end positions, including any removed symbols in between.
int32_t mappedEndLocation = [mapping[endIndex] intValue]; | |
int32_t mappedEndLocation = [mapping[endIndex] intValue]; | |
// The mapped length should cover all characters between mappedLocation and mappedEndLocation, including any removed symbols. |
Copilot uses AI. Check for mistakes.
if (endIndex >= mapping.count) | ||
return invalidRange; | ||
|
||
int32_t mappedEndLocation = [mapping[endIndex] intValue]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation assumes a contiguous range in the original string, but when symbols are removed, the characters may not be adjacent in the original string. The mapped length should account for all characters between the start and end positions, including any removed symbols in between.
int32_t mappedEndLocation = [mapping[endIndex] intValue]; | |
int32_t mappedEndLocation = [mapping[endIndex] intValue]; | |
// mappedLength should include all characters between mappedLocation and mappedEndLocation in the original string, | |
// including any removed symbols in between. |
Copilot uses AI. Check for mistakes.
yield return new object[] { s_invariantCompare, "\uD800\uDC00", "\uD800", CompareOptions.None, true, 1 }; | ||
yield return new object[] { s_invariantCompare, "\uD800\uDC00", "\uD800", CompareOptions.IgnoreCase, true, 1 }; | ||
} | ||
else | ||
{ | ||
yield return new object[] { s_hungarianCompare, "dzsdzsfoobar", "ddzsf", CompareOptions.None, false, 0 }; | ||
if (PlatformDetection.IsNotHybridGlobalizationOnApplePlatform) | ||
yield return new object[] { s_invariantCompare, "''Tests", "Tests", CompareOptions.IgnoreSymbols, false, 0 }; | ||
yield return new object[] { s_invariantCompare, "''Tests", "Tests", CompareOptions.IgnoreSymbols, PlatformDetection.IsHybridGlobalizationOnApplePlatform ? true : false, 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment with the issue link tracking the wrong results we have in this case?
// \p{S} - Symbols (currency, mathematical, modifier, and other symbols) | ||
// \p{Z} - Separators (space, line, and paragraph separators) | ||
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"[\\p{P}\\p{S}\\p{Z}]" options:NSRegularExpressionCaseInsensitive error:&error]; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would make sense to factor this to its own inlined method? I am seeing using more than once
NSString *charString = [NSString stringWithCharacters:&ch length:1]; | ||
|
||
// Check if this character matches the symbol pattern | ||
NSRange matchRange = [regex rangeOfFirstMatchInString:charString options:0 range:NSMakeRange(0, 1)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive implementation of
CompareOptions.IgnoreSymbols
forIndexOf
on iOS. Because ObjectiveC string APIs (https://developer.apple.com/documentation/foundation/nsstring/compareoptions?language=objc) don't provide a direct alternative toIgnoreSymbols
option, we use regex to remove the symbols from the strings.The implementation works in a following:
Also small refactoring of the iOS globalization codebase and enabling more tests.
#111895